skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Hong, Pengyu"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available December 1, 2026
  2. Enhancing accurate molecular property predic- tion relies on effective and proficient representa- tion learning. It is crucial to incorporate diverse molecular relationships characterized by multi- similarity (self-similarity and relative similarities) (Wang et al., 2019) between molecules. However, current molecular representation learning meth- ods fall short in exploring multi-similarity and of- ten underestimate the complexity of relationships between molecules. Additionally, previous multi- similarity approaches require the specification of positive and negative pairs to attribute distinct pre- defined weights to different relative similarities, which can introduce potential bias. In this work, we introduce Graph Multi-Similarity Learning for Molecular Property Prediction (GraphMSL) framework, along with a novel approach to for- mulate a generalized multi-similarity metric with- out the need to define positive and negative pairs. In each of the chemical modality spaces (e.g., molecular depiction image, fingerprint, NMR, and SMILES) under consideration, we first de- fine a self-similarity metric (i.e., similarity be- tween an anchor molecule and another molecule), and then transform it into a generalized multi- similarity metric for the anchor through a pair weighting function. GraphMSL validates the effi- cacy of the multi-similarity metric across Molecu- leNet datasets. Furthermore, these metrics of all modalities are integrated into a multimodal multi-similarity metric, which showcases the po- tential to improve the performance. Moreover, the focus of the model can be redirected or cus- tomized by altering the fusion function. Last but not least, GraphMSL proves effective in drug dis- covery evaluations through post-hoc analyses of the learnt representations. 
    more » « less
  3. Nuclear magnetic resonance (NMR) spectroscopy plays an essential role in deciphering molecular structure and dynamic behaviors. While AI-enhanced NMR prediction models hold promise, challenges still persist in tasks such as molecular retrieval, iso- mer recognition, and peak assignment. In response, this paper introduces a novel solution, Knowledge-Guided Multi-Level Multimodal Alignment with Instance-Wise Discrimination (K-M3 AID), which establishes correspondences between two heterogeneous modalities: molecular graphs and NMR spectra. K- M3AID employs a dual-coordinated contrastive learning architecture with three key modules: a graph-level alignment module, a node-level alignment module, and a communication channel. Notably, K-M3AID introduces knowledge- guided instance-wise discrimination into contrastive learning within the node-level alignment module. In addition, K-M3 AID demonstrates that skills acquired during node-level alignment have a positive impact on graph-level alignment, acknowledging meta-learning as an inherent property. Empirical validation underscores the effectiveness of K-M3AID in multiple zero- shot tasks. 
    more » « less
  4. Deep learning-based optical flow (DLOF) extracts features in video frames with deep convolutional neural networks to estimate the inter-frame motions of objects. DLOF computes velocity fields more accurately than PIV for densely labeled systems. 
    more » « less
  5. Abstract Stereoselective reactions have played a vital role in the emergence of life, evolution, human biology, and medicine. However, for a long time, most industrial and academic efforts followed a trial-and-error approach for asymmetric synthesis in stereoselective reactions. In addition, most previous studies have been qualitatively focused on the influence of steric and electronic effects on stereoselective reactions. Therefore, quantitatively understanding the stereoselectivity of a given chemical reaction is extremely difficult. As proof of principle, this paper develops a novel composite machine learning method for quantitatively predicting the enantioselectivity representing the degree to which one enantiomer is preferentially produced from the reactions. Specifically, machine learning methods that are widely used in data analytics, including Random Forest, Support Vector Regression, and LASSO, are utilized. In addition, the Bayesian optimization and permutation importance tests are provided for an in-depth understanding of reactions and accurate prediction. Finally, the proposed composite method approximates the key features of the available reactions by using Gaussian mixture models, which provide suitable machine learning methods for new reactions. The case studies using the real stereoselective reactions show that the proposed method is effective and provides a solid foundation for further application to other chemical reactions. 
    more » « less
  6. A machine learning model for reliable director fields calculation from raw experimental images of active nematics. The model is accurate, robust to noise and generalizable, enhancing analysis such as the detection and tracking of topological defects. 
    more » « less
  7. This work considers the task of representation learning on the attributed relational graph (ARG). Both the nodes and edges in an ARG are associated with attributes/features allowing ARGs to encode rich structural information widely observed in real applications. Existing graph neural networks offer limited ability to capture complex interactions within local structural contexts, which hinders them from taking advantage of the expression power of ARGs. We propose motif convolution module (MCM), a new motif-based graph representation learning technique to better utilize local structural information. The ability to handle continuous edge and node features is one of MCM’s advantages over existing motif-based models. MCM builds a motif vocabulary in an unsupervised way and deploys a novel motif convolution operation to extract the local structural context of individual nodes, which is then used to learn higher level node representations via multilayer perceptron and/or message passing in graph neural networks. When compared with other graph learning approaches to classifying synthetic graphs, our approach is substantially better at capturing structural context. We also demonstrate the performance and explainability advantages of our approach by applying it to several molecular benchmarks. 
    more » « less
  8. Knowledge graph (KG) representation learning aims to encode entities and relations into dense continuous vector spaces such that knowledge contained in a dataset could be consistently represented. Dense embeddings trained from KG datasets benefit a variety of downstream tasks such as KG completion and link prediction. However, existing KG embedding methods fell short to provide a systematic solution for the global consistency of knowledge representation. We developed a mathematical language for KG based on an observation of their inherent algebraic structure, which we termed as Knowledgebra. By analyzing five distinct algebraic properties, we proved that the semigroup is the most reasonable algebraic structure for the relation embedding of a general knowledge graph. We implemented an instantiation model, SemE, using simple matrix semigroups, which exhibits state-of-the-art performance on standard datasets. Moreover, we proposed a regularization-based method to integrate chain-like logic rules derived from human knowledge into embedding training, which further demonstrates the power of the developed language. As far as we know, by applying abstract algebra in statistical learning, this work develops the first formal language for general knowledge graphs, and also sheds light on the problem of neural-symbolic integration from an algebraic perspective. 
    more » « less